Skip to content

Conversation

@gbaraldi
Copy link
Member

Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned pieces
for better performance and vectorization.

Also adds the missing setAlignment call in splitOnStack.

Co-Authored-By: Claude Opus 4.5 [email protected]

Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned pieces
for better performance and vectorization.

Also adds the missing setAlignment call in splitOnStack.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@oscardssmith oscardssmith added the compiler:codegen Generation of LLVM IR and native code label Jan 15, 2026
@gbaraldi gbaraldi added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 backport 1.13 Change should be backported to release-1.13 and removed backport 1.11 Change should be backported to release-1.11 backport 1.10 Change should be backported to the 1.10 release labels Jan 15, 2026
@gbaraldi gbaraldi requested review from vchuravy and vtjnash January 15, 2026 14:27
if (sz > 1)
align = MinAlign(JL_SMALL_BYTE_ALIGNMENT, NextPowerOf2(sz));
// Inherit alignment from the original allocation, with GC alignment as minimum.
Align align(std::max((unsigned)orig_inst->getRetAlign().valueOrOne().value(), (unsigned)JL_SMALL_BYTE_ALIGNMENT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels loosely unsound, since this is the minimum known alignment, and not the required alignment. The JL_SMALL_BYTE_ALIGNMENT value is the largest value that julia.gc_alloc_obj is permitted to return, so it is sometimes reasonable that we can use this as a hint, but we should be sure to clarify that this overalignment is merely a hint to the layout (although being more than 16 will penalize performance since it requires a more expensive stack adjustment on entry)

    (unsigned)orig_inst->getRetAlign().valueOrOne().value()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the allocation required a larger alignment wouldn't we inherit that, that's why we prefer to inherit and just baseline to gc align

Copy link
Member

@vtjnash vtjnash Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRetAlign is not the required alignment, it is the minimum, so if it requires it, this would introduce a bug here

but that said, the function isn't capable of giving more than JL_SMALL_BYTE_ALIGNMENT (16) so having getRetAlign return more than 16 here would be a miscompile, so this always gives the correct answer anyways (and increasing from there is only a runtime performance penalty, not a correctness issue)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s the alignment we emitted no? If it required more it would already be a bug no? Unless you mean a gc alloc aligned that wouldn’t tell LLVM


; CHECK-LABEL: @ccall_ptr
; CHECK: alloca i64
; CHECK: alloca i128, align 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @maleadt

This might cause issues for the intel backend? If I recall correctly, they don't like i128?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure Our Int128 emits i128.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this i128 should be storage only most of the time, I just followed whatever we do for base Julia

@DilumAluthge DilumAluthge mentioned this pull request Jan 15, 2026
40 tasks
Some backends don't support integer types larger than 64 bits, so cap
the element size used in emit_static_alloca and AllocOpt at i64. For
allocations larger than 8 bytes, use arrays of i64 instead of i128/i256.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@gbaraldi gbaraldi added the merge me PR is reviewed. Merge when all tests are passing label Jan 20, 2026
@oscardssmith oscardssmith merged commit 54fde7e into master Jan 21, 2026
9 checks passed
@oscardssmith oscardssmith deleted the gb/alloca-align branch January 21, 2026 05:44
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Jan 21, 2026
KristofferC pushed a commit that referenced this pull request Jan 26, 2026
Inherit alignment from the original GC allocation with
JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned
pieces
for better performance and vectorization.

Also adds the missing setAlignment call in splitOnStack.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
(cherry picked from commit 54fde7e)
@KristofferC KristofferC mentioned this pull request Jan 26, 2026
43 tasks
KristofferC pushed a commit that referenced this pull request Jan 26, 2026
Inherit alignment from the original GC allocation with
JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned
pieces
for better performance and vectorization.

Also adds the missing setAlignment call in splitOnStack.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
(cherry picked from commit 54fde7e)
@KristofferC KristofferC removed backport 1.12 Change should be backported to release-1.12 backport 1.13 Change should be backported to release-1.13 labels Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler:codegen Generation of LLVM IR and native code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants